First, let’s prep some data
df <- penguins_raw
df
## # A tibble: 344 × 17
## studyName `Sample Number` Species Region Island Stage `Individual ID`
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 PAL0708 1 Adelie Pengu… Anvers Torge… Adult,… N1A1
## 2 PAL0708 2 Adelie Pengu… Anvers Torge… Adult,… N1A2
## 3 PAL0708 3 Adelie Pengu… Anvers Torge… Adult,… N2A1
## 4 PAL0708 4 Adelie Pengu… Anvers Torge… Adult,… N2A2
## 5 PAL0708 5 Adelie Pengu… Anvers Torge… Adult,… N3A1
## 6 PAL0708 6 Adelie Pengu… Anvers Torge… Adult,… N3A2
## 7 PAL0708 7 Adelie Pengu… Anvers Torge… Adult,… N4A1
## 8 PAL0708 8 Adelie Pengu… Anvers Torge… Adult,… N4A2
## 9 PAL0708 9 Adelie Pengu… Anvers Torge… Adult,… N5A1
## 10 PAL0708 10 Adelie Pengu… Anvers Torge… Adult,… N5A2
## # … with 334 more rows, and 10 more variables: Clutch Completion <chr>,
## # Date Egg <date>, Culmen Length (mm) <dbl>, Culmen Depth (mm) <dbl>,
## # Flipper Length (mm) <dbl>, Body Mass (g) <dbl>, Sex <chr>,
## # Delta 15 N (o/oo) <dbl>, Delta 13 C (o/oo) <dbl>, Comments <chr>
# gross!
names(df)
## [1] "studyName" "Sample Number" "Species"
## [4] "Region" "Island" "Stage"
## [7] "Individual ID" "Clutch Completion" "Date Egg"
## [10] "Culmen Length (mm)" "Culmen Depth (mm)" "Flipper Length (mm)"
## [13] "Body Mass (g)" "Sex" "Delta 15 N (o/oo)"
## [16] "Delta 13 C (o/oo)" "Comments"
# clean those names
df %>%
clean_names() %>%
names()
## [1] "study_name" "sample_number" "species"
## [4] "region" "island" "stage"
## [7] "individual_id" "clutch_completion" "date_egg"
## [10] "culmen_length_mm" "culmen_depth_mm" "flipper_length_mm"
## [13] "body_mass_g" "sex" "delta_15_n_o_oo"
## [16] "delta_13_c_o_oo" "comments"
A little regex first
😄 Pro tip: match anything that you put in []
# are there spaces or capital letters in col names?
str_detect(names(df), "[\\sA-Z()/-]")
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE
# but we only want one answer, so wrap in any()
any(str_detect(names(df), "[\\sA-Z()/-]"))
## [1] TRUE
😄 Pro tip: str_view() to see string matches (requires htmlwidgets)
# let's see where our pattern matches
str_view_all(names(df), "[\\sA-Z()/-]")
😄 Pro tip: cmd+i to fix bad indentation
# snippet time! type "if" then hit shift+tab
# paste in our regex condition and our clean_names code
# try yours right below here:
# should look like this when you're done
if (any(str_detect(names(df), "[\\sA-Z()/-]"))) {
df <- df %>%
clean_names()
}
# highlight the if statement above then hit cmd+i to fix the indentation
# inspect
names(df)
## [1] "study_name" "sample_number" "species"
## [4] "region" "island" "stage"
## [7] "individual_id" "clutch_completion" "date_egg"
## [10] "culmen_length_mm" "culmen_depth_mm" "flipper_length_mm"
## [13] "body_mass_g" "sex" "delta_15_n_o_oo"
## [16] "delta_13_c_o_oo" "comments"
any(str_detect(names(df), "[\\sA-Z()/-]"))
## [1] FALSE
😄 Pro tip: cmd+shift+m to insert pipe
names(df) %>%
str_detect("[\\sA-Z()/-]") %>%
any()
## [1] FALSE
😄 Pro tip: alt+dash for assignment arrow
df <- df %>%
clean_names()
😄 Pro tip: type fun then hit shift+tab
😄 Pro tip: cmd+f for find (and replace)
# type fun then hit shift+tab
# name it clean_if_bad_names
# one arg called x
# put our if statement in the body
# cmd+i to fix indent
# cmd+f to change df to x
# try yours right below here:
# should look like this when you are done
clean_if_bad_names <- function(x) {
if (any(str_detect(names(x), "[\\sA-Z()/-]"))) {
x <- clean_names(x)
x
}
x
}
# reset our df back to original with bad column names
df <- penguins_raw
# use our new function
df <- clean_if_bad_names(x = df)
# type for then hit shift+tab:
😄 Pro tip: make your own
The contents of the snippet should be indented below using the
Make a snippet called ec with the following lines
library(here)
library(ggplot2)
library(tidyr)
library(dplyr)
library(stringr)
library(purrr)
Everyday Carry - hit shift+tab after the ec below
ec
I use that as lightweight version of library(tidyverse) when I don’t want or need to load all core tidyverse packages - particularly important for a production environment that needs to be trim.
😄 Pro tip: If you are new to ggplot2, you can install the esquisse package, which also installs the ggplot2 builder addin, which you can use to build a plot using a GUI, then have the code, too!
# click below here before starting the esquisse addin